Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #33308 -- Added support for psycopg version 3 #15687

Merged
merged 1 commit into from Dec 15, 2022

Conversation

apollo13
Copy link
Member

@apollo13 apollo13 commented May 12, 2022

What did I do? I took
https://github.com/dvarrazzo/django-psycopg3-backend and blackified it +
ported over most (all?) new commits. I am now opening this on GitHub to
be able to nicely diff and start a discussion about whether we can
support psycopg2 & 3 easiyl from the same codebase (I think we can).


I (i.e. @felixxm) have the following plan to move this forward:

@apollo13
Copy link
Member Author

Looking through the code base there are quite a few areas where it would probably be easier if we just assumed that if psycopg3 is installed that we want to use it; this might get a bit more fun for testing (extra environment, but realistically speaking we want to be on psycopg3 only in the longrun anways…)

@apollo13
Copy link
Member Author

We are down to three failures :)

@apollo13 apollo13 force-pushed the psycopg3 branch 5 times, most recently from d438338 to 03d5568 Compare May 13, 2022 14:27
Copy link
Sponsor Member

@adamchainz adamchainz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work picking this up @apollo13 . I've had a quick look and made some comments. I hope to find time to test this with a client project test suite in a week or so.

django/db/backends/postgresql/schema.py Outdated Show resolved Hide resolved
django/db/models/functions/comparison.py Outdated Show resolved Hide resolved
django/db/utils.py Outdated Show resolved Hide resolved
django/db/backends/postgresql/base.py Outdated Show resolved Hide resolved
django/db/backends/postgresql/base.py Outdated Show resolved Hide resolved
@apollo13
Copy link
Member Author

@timgraham I'd love if you could look over this and maybe test your cockroachdb backend against this. It would be great if I don't fully break it :)

timgraham added a commit to timgraham/django-cockroachdb that referenced this pull request May 21, 2022
@timgraham
Copy link
Member

Besides the issue in SchemaLoggerTests, there's one other regression with psycopg2. (I'll work through the failures with psycopg3 later).

======================================================================
FAIL: test_orders_nulls_first_on_filtered_subquery (ordering.tests.OrderingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tim/code/django/tests/ordering/tests.py", line 199, in test_orders_nulls_first_on_filtered_subquery
    self.assertQuerysetEqualReversible(
  File "/home/tim/code/django/tests/ordering/tests.py", line 129, in assertQuerysetEqualReversible
    self.assertSequenceEqual(queryset.reverse(), list(reversed(sequence)))
AssertionError: Sequences differ: <QuerySet [<Author: Name 3>, <Author: Name 1>, <Author: Name 2>]> != [<Author: Name 2>, <Author: Name 1>, <Author: Name 3>]

First differing element 0:
<Author: Name 3>
<Author: Name 2>

- <QuerySet [<Author: Name 3>, <Author: Name 1>, <Author: Name 2>]>
? ----------               ^                                   ^  -

+ [<Author: Name 2>, <Author: Name 1>, <Author: Name 3>]
?                ^                                   ^

The old SQL:

 SELECT DISTINCT "ordering_author"."id",
                "ordering_author"."name",
                "ordering_author"."editor_id",
                (SELECT Max(U0."pub_date") AS "last_date"
                 FROM   "ordering_article" U0
                 WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                          AND Upper(U0."headline" :: text) LIKE Upper(
                              '%Article%') )
                 GROUP  BY U0."author_id") AS "last_date",
                (SELECT Max(U0."pub_date") AS "last_date"
                 FROM   "ordering_article" U0
                 WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                          AND Upper(U0."headline" :: text) LIKE Upper(
                              '%Article%') )
                 GROUP  BY U0."author_id") IS NULL,
                (SELECT Max(U0."pub_date") AS "last_date"
                 FROM   "ordering_article" U0
                 WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                          AND Upper(U0."headline" :: text) LIKE Upper(
                              '%Article%') )
                 GROUP  BY U0."author_id")
FROM   "ordering_author"
ORDER  BY (SELECT Max(U0."pub_date") AS "last_date"
           FROM   "ordering_article" U0
           WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                    AND Upper(U0."headline" :: text) LIKE Upper('%Article%') )
           GROUP  BY U0."author_id") IS NULL,
          (SELECT Max(U0."pub_date") AS "last_date"
           FROM   "ordering_article" U0
           WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                    AND Upper(U0."headline" :: text) LIKE Upper('%Article%') )
           GROUP  BY U0."author_id") DESC  

The new SQL:

 SELECT DISTINCT "ordering_author"."id",
                "ordering_author"."name",
                "ordering_author"."editor_id",
                (SELECT Max(U0."pub_date") AS "last_date"
                 FROM   "ordering_article" U0
                 WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                          AND Upper(U0."headline" :: text) LIKE Upper(
                              '%Article%') )
                 GROUP  BY U0."author_id") AS "last_date",
                (SELECT Max(U0."pub_date") AS "last_date"
                 FROM   "ordering_article" U0
                 WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                          AND Upper(U0."headline" :: text) LIKE Upper(
                              '%Article%') )
                 GROUP  BY U0."author_id") IS NULL,
                (SELECT Max(U0."pub_date") AS "last_date"
                 FROM   "ordering_article" U0
                 WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                          AND Upper(U0."headline" :: text) LIKE Upper(
                              '%Article%') )
                 GROUP  BY U0."author_id")
FROM   "ordering_author"
ORDER  BY 5 DESC  

It might be that because CockroachDB has DatabaseFeatures.nulls_order_largest = False (unlike PostgreSQL), the loss of the second subquery in the ORDER BY is problematic.

django/db/backends/postgresql/base.py Outdated Show resolved Hide resolved
django/db/backends/postgresql/operations.py Outdated Show resolved Hide resolved
django/db/backends/postgresql/psycopg_any.py Outdated Show resolved Hide resolved
@apollo13 apollo13 changed the title Draft PR for psycopg3 support. Draft PR for psycopg3 support. Fixes #33308 Jun 5, 2022
@apollo13
Copy link
Member Author

apollo13 commented Jun 5, 2022

It might be that because CockroachDB has DatabaseFeatures.nulls_order_largest = False (unlike PostgreSQL), the loss of the second subquery in the ORDER BY is problematic.

This is probably a result of https://github.com/django/django/pull/15687/files#diff-f58de2deaccecd2d53199c5ca29e3e1050ec2adb80fb057cdfc0b4e6accdf14fR753-R769 but if it looses the second ORDER BY then this might be a problem in the linked code and not in cdb.

@apollo13
Copy link
Member Author

apollo13 commented Jun 5, 2022

@timgraham I can reproduce your query issue when I set supports_order_by_nulls_modifier = False (which it probably is on old cockroachdb versions: https://github.com/cockroachdb/django-cockroachdb/blob/master/django_cockroachdb/features.py#L60 -- did you test against an old version?). That said, the combination of supports_order_by_nulls_modifier = False & supports_order_column_alias = True certainly shows there is a problem in the new code.

@apollo13 apollo13 force-pushed the psycopg3 branch 3 times, most recently from d074e9d to b641d99 Compare June 7, 2022 06:33
timgraham added a commit to timgraham/django-cockroachdb that referenced this pull request Jun 7, 2022
@timgraham
Copy link
Member

@timgraham I can reproduce your query issue when I set supports_order_by_nulls_modifier = False (which it probably is on old cockroachdb versions: https://github.com/cockroachdb/django-cockroachdb/blob/master/django_cockroachdb/features.py#L60 -- did you test against an old version?). That said, the combination of supports_order_by_nulls_modifier = False & supports_order_column_alias = True certainly shows there is a problem in the new code.

Yes, the failure is only present on older versions of CockroachDB.

@apollo13
Copy link
Member Author

buildbot, test on oracle.

@auvipy
Copy link
Contributor

auvipy commented Jul 11, 2022

wow all tests passing! great work

timgraham added a commit to timgraham/django-cockroachdb that referenced this pull request Jul 22, 2022
timgraham added a commit to timgraham/django-cockroachdb that referenced this pull request Aug 5, 2022
@pauloxnet
Copy link
Contributor

pauloxnet commented Dec 13, 2022

I tested this branch with different versions of psycopg packages without any errors.
I've also recorded the timings below:

Psycopg Tests Time DB setup DB teardown Total
psycopg2 2.9.5 16130 191.615s 113.857s 7.279s 315.014s
psycopg[c] 3.1.5 16130 193.814s 120.468s 7.407s 323.853s
psycopg[c] 3.1.4 16130 194.309s 114.014s 7.319s 318.004s
psycopg[binary] 3.1.4 16130 195.484s 116.064s 7.048s 321.018s
psycopg 3.1.4 16130 201.395s 115.534s 7.941s 327.230s

@dvarrazzo
Copy link
Contributor

FYI, psycopg 3.1.5 has been just released, including several performance improvements.

Also, FYI, I have noticed a relatively large performance impact asking for cursor.description. In case you are using the description attribute just to extract the column names, you might want to use the cursor.pgresult instead. See psycopg/psycopg#457 for details.

@pauloxnet
Copy link
Contributor

FYI, psycopg 3.1.5 has been just released, including several performance improvements.

I have installed psycopg 3.1.5 and run the tests. The timing difference has remained almost the same as in psycopg 3.1.4 and the db setup times have increased, but this could be due to the load of my pc at that moment (even if in reality it was idle).

I've updated to the results table above.

Do you know if there is a way to run only the psycopg backend tests excluding the others?

@felixxm felixxm force-pushed the psycopg3 branch 3 times, most recently from 86ec886 to 15f4669 Compare December 14, 2022 08:29
@felixxm
Copy link
Member

felixxm commented Dec 14, 2022

I pushed docs changes.

@felixxm felixxm force-pushed the psycopg3 branch 2 times, most recently from 9e645d1 to 9701e9a Compare December 14, 2022 11:42
The `psycopg2`_ module is required for use as the database adapter when using
GeoDjango with PostGIS.
The `psycopg`_ or `psycopg2`_ module is required for use as the database
adapter when using GeoDjango with PostGIS.

On Debian/Ubuntu, you are advised to install the following packages:
``postgresql-x.x``, ``postgresql-x.x-postgis``, ``postgresql-server-dev-x.x``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``postgresql-x.x``, ``postgresql-x.x-postgis``, ``postgresql-server-dev-x.x``,
``postgresql-x``, ``postgresql-x-postgis-3``, ``postgresql-server-dev-x``,

PostgreSQL 12 is the minimum version supported by Django 4.2. Since PostgreSQL 10 the version number is incremented by 1 every time, so I have updated the package names with the really available ones in the Debian/Ubuntu repositories.
PS. there is no postgresql-x-postgis-2.x package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cleanup in docs that can be backported. Please submit a separate PR with this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I submitted the PR #16385

docs/ref/contrib/gis/install/postgis.txt Outdated Show resolved Hide resolved
django/db/backends/postgresql/operations.py Outdated Show resolved Hide resolved
django/db/backends/postgresql/base.py Outdated Show resolved Hide resolved
@felixxm
Copy link
Member

felixxm commented Dec 14, 2022

I'm going to update tests/requirements/py3.txt and CI jobs before the final build.

Copy link
Member

@carltongibson carltongibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remarkably minimal. Nice 👍

A couple of comments...

docs/releases/4.2.txt Outdated Show resolved Hide resolved
docs/ref/databases.txt Outdated Show resolved Hide resolved
@felixxm
Copy link
Member

felixxm commented Dec 14, 2022

I squashed commits and pushed edits to docs.

@pauloxnet
Copy link
Contributor

I tested this Django branch with Psycopg 3.1.5 in a local project where I use GeoDjango a lot and everything seems to work.

The only error is this AttributeError in Django Debug Toolbar 'Connection' object has no attribute 'status':
https://github.com/jazzband/django-debug-toolbar/blob/main/debug_toolbar/panels/sql/tracking.py#L150

But it will be a fix to do in Django Debug Toolbar.

docs/ref/databases.txt Outdated Show resolved Hide resolved
Thanks Simon Charette, Tim Graham, and Adam Johnson for reviews.

Co-authored-by: Florian Apolloner <florian@apolloner.eu>
Co-authored-by: Mariusz Felisiak <felisiak.mariusz@gmail.com>
sql.quote = _quote
def load(self, data):
res = super().load(data)
return res.replace(tzinfo=self.timezone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line correct?

I'm doing something weird - using Django's connection params in a non-Django settings (I know it's not recommended, all warranty is off, regular Django not affected). The program's timezone is different from the DB timezone. This line causes the time to be read incorrectly, because it transforms e.g. 2023-04-09 19:44:32.768813+03:00 to 2023-04-09 19:44:32.768813+00:00, which is a different time.

I wonder if this is what was really meant, or if this wants to be res.astimezone(tzinfo=self.timezone) (with special handling of None)? That would result in 2023-04-09 16:44:32.768813+00:00.

Or maybe this assumes the DB timezone == app timezone? But in that case, why is this needed at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing with return res or return res.astimezone(self.timezone) causes many tests to fails, e.g. in queries or postgres_tests.test_ranges.

Or maybe this assumes the DB timezone == app timezone? But in that case, why is this needed at all?

As far as I'm aware, this is needed because datetimes are loaded without a timezone so we have to set it manually 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing with return res or return res.astimezone(self.timezone) causes many tests to fails, e.g. in queries or postgres_tests.test_ranges.

Thanks for checking, I'll try it myself and investigate more.
If you're OK with trying a little tweak, that would save me some effort setting up Django dev env. The docstring says "The timezone can be None too, in which case it will be chopped.", so I think it should be something like this:

res = super().load(data)
if self.timezone is None:
    # USE_TZ=False, convert to system local time zone and make naive.
    return res.astimezone(None).replace(tzinfo=None)
else:
    # USE_TZ=True, convert to configured timezone.
    return res.astimezone(self.timezone)

As far as I'm aware, this is needed because datetimes are loaded without a timezone so we have to set it manually thinking

As far as I can see, psycopg's TimestamptzLoader (the super class) always returns an aware datetime in the DB (connection) timezone.

Overall, looking at 4.1.x code, I don't see any messing with timezones other than setting the DB connection's timezone, so I'll try to understand why it's needed in psycopg3 in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking at 4.1.x code, I don't see any messing with timezones other than setting the DB connection's timezone, so I'll try to understand why it's needed in psycopg3 in the first place.

This is probably required in the Django test suite because we're changing the TIME_ZONE setting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into it, if anyone's interested (TL;DR - it's fine).

I believe the TimestamptzLoader is meant to replace this code in psycopg2 backend create_cursor:

def create_cursor(self, name=None):
    ... snipped ...
    cursor.tzinfo_factory = self.tzinfo_factory if settings.USE_TZ else None
    return cursor

def tzinfo_factory(self, offset):                                           
    return self.timezone

In psycopg2 the tzinfo_factory line has this effect when loading a timestamptz (see code):

  1. If USE_TZ = False, returns a naive datetime in DB connection timezone - equivalent to reading aware datetime from DB and doing replace(tzinfo=None).
  2. If USE_TZ = True, returns an aware datetime in DB connection timezone and replaces its timezone to Django DB timezone - equivalent to doing replace(tzinfo=self.timezone).

i.e. same as TimestamptzLoader's res.replace(tzinfo=self.timezone) code.

So this indeed relies on the assumption that Django DB timezone == connection timezone, which Django ensures when setting up a new connection.

I think that once psycopg2 support is dropped, this assumption could be dropped, making things less dangerous. But until then, it's better to keep the psycopg2 and 3 behaviors the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet